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ABSTRACT 


Personalized learning considers that the causal effects of a 
studied learning intervention may differ for the individual 
student. Making the inference about causal effects of studies 
interventions is a central problem. In this paper we propose 
he Residual Counterfactual Networks (RCN) for answer- 
ing counterfactual inference questions, such as ”Would this 
particular student benefit more from the video hint or the 
ext hint when the student cannot solve a problem?”. The 
model learns a balancing representation of students by min- 
imizing the distance between the distributions of the con- 
rol and the treated populations, and then uses a residual 
block to estimate the individual treatment effect based on 
he representation of the student. We run experiments on 
semi-simulated datasets and real-world educational online 
experiment datasets to evaluate the efficacy of our model. 
The results show that our model matches or outperforms 
the state-of-the-art. 


Keywords 
Counterfactual inference, deep residual learning, educational 
experiments, individual treatment effect 


1. INTRODUCTION 


The goal of personalized learning is to provide pedagogy, 
curriculum, and learning environments to meet the needs 
of individual students. For example, an Intelligent Tutor 
System (ITS) decides which hints would most benefit a spe- 
cific student. If the ITS could infer what the student per- 
formance would be after receiving each hint, then it would 
simply choose the hint which leads to the best performance 
for the student. To make this possible, we might run an 
online educational experiment by randomly assigning stu- 
dents to one of the hints, and collect student performance. 
Then making predictions about causal effects of possible in- 
terventions (e.g. available hints) becomes a central problem 
in this case. In this paper we focus on the task of answering 
counterfactual questions [8] such as, "Would this particular 
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student benefit more from the video hint or the text hint 
when the student cannot solve a problem?” 


There are two ways of collecting data for counterfactual in- 
ference: randomized control trials (RCTs) and observational 
studies. In RCTs, participants (e.g. students) are randomly 
assigned to interventions (e.g. video hints or text hints), 
while participants in observational studies are not essentially 
randomly assigned to interventions. For example, consider 
the experiment of evaluating the efficacy of video hints and 
text hints for a certain problem. Under the design of RCT, 
students who need a hint would be randomly assigned to 
either the video hints or the text hints. In an observational 
study, students are assigned to one of the interventions based 
on their contextual information, such as knowledge level or 
personal preference. 


[5] proposed Balancing Neural Networks (BNN) which can 
be applied to solve the counterfactual inference problem. 
They used a form of regularizer to enforce the similarity be- 
tween the distributions of representations learned for popu- 
lations with different interventions, for example, the repre- 
sentations for students who received text hints versus those 
who received video hints.This reduces the variance from fit- 
ting a model on one distribution and applying it to another. 
Because of random assignment to the interventions in RCTs, 
the distributions of the populations within different inter- 
ventions are highly likely to be identical. However, in the 
observational study, we may end up with the situation where 
only male students receive video hints and female students 
receive text hints. Without enforcing the similarity between 
the distributions of representations for male and female stu- 
dents, it is not safe to make a prediction of the outcome if 
male students receive text hints. In machine learning, ”do- 
main adaptation” [7] refers to the dissimilarity of the distri- 
butions between the training data and the test data. 


Recent work [6] has demonstrated that (deep) neural net- 
works can be used with domain adaptation approaches to 


produce outstanding results on some domain adaptation bench- 


mark datasets. Motivated by their work, we propose the 
Residual Counterfactual Networks (RCN) for the counter- 
factual inference to estimate the individual treatment effect 
and evaluate its efficacy in both a simulated dataset and a 
real-world dataset from an educational online experiment. 
The RCN extends the BNN by adding a residual block to 
estimate the individual treatment effect (ITE) based on the 
learned representation of participants. The idea of the resid- 
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ual block is originated from the state-of-the-art deep residual 
learning [2]. We enable the estimation of ITE by plugging 
several layers into neural networks to explicitly learn the 
residual function with reference to the learned representa- 
tion. 


The rest of the paper is organized as follows. Section 2 pro- 
vides an overview of the problem setup of counterfactual 
inference for estimating the ITE. Section 3 details informa- 
tion of our model. Section 4 gives an overview of related 
work in this research area. Section 5 describes the datasets 
and evaluation metrics used to test our model. Section 6 
presents the results of our model and compares them with 
other models. Finally, we discuss the results and conclude 
the paper. 


2. PROBLEM SETUP 


Let 7 be the set of proposed interventions we wish to con- 
sider, X the set of participants, and Y the set of possible 
outcomes. For each proposed intervention t € 7, let Y; € Y 
be the potential outcome for x when x is assigned to the 
intervention t. In randomized control trial (RCT) and ob- 
served study, only one outcome is observed for a given par- 
ticipant x; even if the participant is given an intervention 
and later the other, the participant is not in the same state. 
In machine learning, "bandit feedback” refers to this kind of 
partial feedback. The model described above is also known 
as the Rubin-Neyman causal model [11, 10]. 


We focus on a binary intervention set T = {0,1}, where 
intervention 1 is often referred as the ”treated” and inter- 
vention 0 is the ”control.” In this scenario the ITE for a par- 
ticipant x is represented by the quantity of Yi(x) — Yo(a). 
Knowing the quantity helps assign participant x to the best 
of the two interventions when making a decision is needed, 
for example, choosing the best intervention for a specific 
student when the student has a trouble solving a problem. 
However, we cannot directly calculate ITE due to the fact 
that we can only observe the outcome of one of the two 
interventions. 


In this work we follow the common simplifying assumption 
of no-hidden confounding variables. This means that all the 
factors determining the outcome of each intervention are 
observed. This assumption can be formalized as the strong 
ignorability condition: 


(¥1, Yo) L t|x,0 < p(t = 1x) < 1,Vz. 


Note that we cannot evaluate the validity of strong ignor- 
ability from data, and the validity must be determined by 
domain knowledge. 


In the "treated” and the ”control” setting, we refer to the 
observed and unobserved outcomes as the factual outcome 
y" (x), and the counterfactual outcome y°* (x) respectively. 
In other words, when the participant x is assigned to the 
“control” (t = 0), y* (a) is equal to Yi(a), and yo" (2) is 
equal to Yo(x). The other way around, y”(z) is equal to 
Yo(a), and y°" (x) is equal to Yi (x). 


Given n samples Ape te hoes where yf = ti + Yi(ai) + 
(1—t;)Yo(x;), a common approach for estimating the ITE is 
to learn a function f : X x T > Y such that f(2;,ti) yf. 
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The estimated ITE is then: 


oy - felon), aa t 


ITE(zi) = ee — ti) -y, i = 0. 


We assume n samples {(xi, ti, yf ere form an empirical 
distribution 6” = {(ai,t;)}"_,. We call this empirical dis- 
tribution 6” ~ p® the empirical factual distribution. In 
order to calculate ITE, we need to infer the counterfactual 
outcome which is dependent on the empirical distribution 
po = {(ai,1—ti)}"_,. We call the empirical distribution 
por nw per. The p* and por may not be equal because 
the distributions of the control and the treated populations 
may be different. The inequality of two distributions may 
cause the counterfactual inference over a different distribu- 
tion than the one observed from the experiment. In ma- 
chine learning terms, this scenario is usually referred to as 
domain adaptation, where the distribution of features in test 
data are different than the distribution of features in train- 
ing data. 


3. MODEL 


We proposed RCN to estimate individual treatment effect 
using counterfactual inference. The RCN first learns a bal- 
ancing representation of deep features ® : X > R®%, and 
then learns a residual mapping Af on the representation to 
estimate the ITE. The structure of the RCN is shown in the 
left side of Figure 1. 


To learn a representation of deep features ®, the RCN uses 
fully connected layers with ReLu activation function, where 
Relu(z) = max(0,z). We need to generalize from factual 
distribution to counterfactual distribution in the feature rep- 
resentation ® to obtain accurate estimation of counterfac- 
ual outcome. The common successful approaches for do- 
main adaptation encourage similarity between the latent fea- 
ture representations w.r.t the different distributions. This 
similarity is often enforced by minimizing a certain distance 
between the domain-specific hidden features. The distance 
between two distributions is usually referred to as the dis- 
crepancy distance, introduced by [7], which is a hypothesis 
class dependent distance measure tailored for domain adap- 
ation. 


In this paper we use an Integral Probability Metric (IPM) 
measure of distance between two distributions po = p(x|t = 
0), and p; = p(a|t = 1), also known as the control and 
treated distributions. The IPM for po and pj is defined as 


i fapo — a fap, 


where F is a class of real-valued bounded measurable func- 
tions on S. 


? 


IPM+#(po, pi) := sup 
fEF 


The choice of functions is the crucial distinction between 
IPMs [15]. Two specific IPMs are used in our experiments: 
the Maximum Mean Discrepancy (MMD), and the Wasser- 
stein distance. When F = {f : ||f|l,, <1}, where H rep- 
resents a reproducing kernel Hilbert space (RKHS) with k 
as its reproducing kernel, IPM is called MMD. In other 
words, the family of norm-1 reproducing kernel Hilbert space 
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Sc) 


F(x) =x+ AF(x) 


Figure 1: (left) Residual Counterfactual Networks for counterfactual inference. IPM is adopted on layers fcl 
and fc2 to minimize the discrepancy distance of the deep features of the control and the treated populations. 
For the treated group, we add a residual block fcri-fcr2 so that fr(x) = fc(x) + Af(x); (right) Residual block 


(RKHS) functions lead to the MMD. The family of 1-Lipschitz 
functions F = {f : ||f||, <1}, where ||f||,, is the Lipschitz 
semi-norm of a bounded continuous real-valued function f, 
make IPM the Wasserstein distance. Both the Wasserstein 
and MMD metrics have consistent estimators which can be 
efficiently computed in the finite sample case [14]. The im- 
portant property of IPM is that po = pi iff IPM+r(po, pi) = 
0. 


The representation with reduction of the discrepancy be- 
tween the control and the treated populations helps the 
model to focus on balancing features across two populations 
when inferring the counterfactual outcomes. For instance, 
if in an experiment, almost no male student ever received 
intervention A, inferring how male students would react to 
intervention A is highly prone to error and a more conser- 
vative use of the gender feature might be warranted. 


After balancing the feature representations of the control 
and the treated populations, the next step is to infer the 
treatment effect for participant x. We adopt the residual 
block [2] to estimate the treatment effect. 


As shown in the right side of Figure 1, F(x) is the underly- 
ing desired function mapping. Instead of stacking a number 
of layers to fit the desired F(a), we let stacked fully con- 
nected layers learn the residual mapping Af (x) = F(x) —«. 
Then the origin mapping is converted into Af(x) + a. The 
operation Af(«) + x is performed by a shortcut connection 
and an element-wise addition. Learning residual mapping 
is favored over fitting the desired mapping directly, because 
it is easier to find the residual with reference to an identity 
mapping than to learn the mapping as new. 


The goal of the residual block is to approximate a residual 
function Af such that fr(x) = fo(x) + Af(fc(x)), where 
fc is the deep representation of participant 2 before being 
fed into the output layer, and fr is the input to the output 
layer for the treated population. The output layer is a ridge 
linear regression to generate the final outcome. From the 
definition of the residual function Af, we see that Af(zx) 
is the estimated treatment effect for participant x, which 
is our interest in a control and treated experiment. With 
the residual block directly connected to fc2, the residual 


Proceedings of the 10th International Conference on Educational Data Mining 


function Af(a) is dependent on the feature representation 
of participant x. 


We plug in the residual block (shown in Figure 1) between 
fc2 layer and final output layer for the treated population 
in order to estimate the ITE. There is no residual block 
plugged in between fc2 layer and the final output layer for 
the control population. The final output layer y(-) is a lin- 
ear regression to calculate the predicted outcome, such that 


Ye= 9(fe(x)), and Yt = y(fr(z)). 


Recall the problem setup described above that there exist 
nm samples Hr He) eae where yf = t;- Yi(ai) + (1 —- 
t;)Yo(ai). In the control and the treated setting, we as- 


sume that n-(n- > 0) samples {(2i,0,y.) “ ~ De are 


assigned to the control ({ = 0), and ni(me > 0) samples 

{ze yf) \ nn D; are assigned to the treated (t = 1), 
i=l 

such that n = ne +m. As described above, RCN is an 

integration of deep feature learning, feature representation 

balancing, and treatment effect estimation in an end-to-end 

fashion with the loss function as such: 


Ne 


‘ 1 (0) 
min ae L(fe(Xi), Y; 
fr=fs+f(fs) Ne eee) 


+ =o Lex), 9?) 


i=1 


+ -IPM(De, Di), 


where is the tradeoff parameter for the IPM penalty, L is 
the loss function of the model. In the case of binary clas- 
sification, DL is the standard cross entropy. In the case of 
regression, L is root-mean-square error (RMSE). During the 
training, the model only has the access to the factual out- 
come. 


4. RELATED WORK 


From a conceptual point of view, our work is inspired by 
the work on domain adaptation and deep residual learn- 
ing. [6] proposed the Residual Transfer Network that adopt 
MMD distance to learn transferable deep features from la- 
beled data in the source domain and unlabeled data in the 
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target domain and adds a residual block to transfer the pre- 
diction classifier from the target domain to the source do- 
main. The structure of our model is similar to that of their 
model. Deep residual learning is introduced by [2], the win- 
ner of the ImageNet ILSVRC 2015 challenge, to ease the 
training of deep networks. The residual block is designed to 
learn residual functions AF'(x) with reference to the layer 
input x. Reformulating layers to the residual block makes 
the training easier than directly learning the original func- 
tions F(x) = AF (x) +x. 


Our model extends the work by [5, 13], where the authors 
build a connection between domain adaptation and counter- 
factual inference. They use IPMs, such as MMD and wasser- 
stein distance, to learn a representation of the data which 
balances the control and treated distributions. The treat- 
ment assignment is concatenated with the representation to 
predict the factual outcome as while the reverse treatment 
assignment is concatenated with the representation to pre- 
dict the counterfactual outcome. Compared to their work, 
we add a residual block to estimate the individual treatment 
effect based on the representation. [17, 1] proposed random 
causal forests (RCF) which is built upon the idea of random 
forests to estimate the heterogeneous treatment effect. 


5. EXPERIMENTS 


5.1 Evaluation Metrics 
To compare among various models, we report the RMSE of 
estimated individual treatment effect, denoted 


cre = | — Y((Hi@s) - Yo@)) — IPFE(@.)), 


and the absolute error in average treatment effect 


n 


care =|— > ~(feles) - 


i=l 


falas) — = 9 -(%i as) = Yo(as))). 


Following [4, 5], we report the Precision in Estimation of 
Heterogeneous Effect (PEHE), 


PEHE = = (Hi) Yo(wi)) — (Gi(wi) — Yo(ai))?. 


Compared to the fact that achieving a small RMSE of esti- 
mated ITE needs the accurate estimation of counterfactual 
responses, a good (small) PEHE requires the accurate esti- 
mation of both factual and counterfactual responses. 


However, calculating errz, €arz, and PEHE requires the 
”eround truth” of the ITE for each participant in the ex- 
periment. We cannot gather the counterfactual outcomes 
from RCTs and observational studies, and thus do not have 
the ITE of each participant. We cannot evaluate errz and 
PEHE on these datasets. In order to evaluate the perfor- 
mance on these datasets across various models, we use a 
measure, called policy risk, introduced by [13]. Given a 
model f, the participant x is assigned to the treatment 
we(a) = 1 if f(x,1) — f(z,0) > A (in the case of RCN, 
Af > A), where X is the treatment threshold, and to the 
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{_} 20,0.) 


t *[_} PM@,-0,,-1) 


Figure 2: CFR for ITE estimation. L is a loss func- 
tion, IPM is an integral probability metric 


control 7/() = 0 otherwise. The risk policy is defined as: 


Rpoi(wf) = 1— (E[¥i|7¢(x) = 1] - p(wy¢ = 1) 
+ E[Yo|m s(x) = 0] - p(w = 0)). 


The empirical estimator of the risk policy on a dataset is 
calculated by: 


Rpoi(ms) = 1 — (E[Yilms(x) = 1,t = 1] - p(y = 1) 
+ E[Yo| s(x) = 0,t = 0] - p( = 0)). 


To obtain the policy risk, we use the method introduced by 
[16]. We select a subset of participants in the dataset where 
the treatment recommendation inferred by the model is the 
same as the treatment assignment in the experiment and 
then calculate the average loss from the subset of the data 
(see Table 1 for illustrative data). 


For the datasets without the “ground truth” on ITE, we 
also calculate the average treatment effect on the treated by 
ATT =~ 21y™ yo? — lyons ,(0) 


ne 2ei=1 Yi ng 2vi=l Yi 


ATT aecyee = |ATT — At, (felos) — falai))). 


, and report the error on 


5.2 Baselines 

Balancing Neural Networks (BNN) is a neural networks- 
based model for counterfactual inference. Compared to RCN, 
it has exactly the same fcl and fc2 layers with IPM regu- 
larizer to learn the representation (x) of the participant 
xz. However, instead of using residual block to estimate 
treatment effect, it concatenates the treatment assignment 
t; to the output of fc2 layer (a) and feeds [®(x;), ti] to an- 
other two fully connected layers to generate the predicted 
outcome. We refer to this particular structure of BNN as 
BNN-2-2, following [5]. 


The Counterfactual Regression (CFR) [13] is built on the 
BNN. The important difference between these two models 
is that the CFR uses a more powerful distribution metric in 
the form of IPMs to learn a balancing representation. We 
compare our model with BNN-2-2 and CFR to verify the 
efficacy of residual block in terms of estimating individual 
treatment effect. 


We introduce a simple neural networks baseline model to 
evaluate the efficacy of the IPM regularizer and residual 
mapping. This baseline model is a feed-forward neural net- 
works model with four hidden layers, trained to predict the 
factual outcome based on X and t, without the IPM regu- 
larizer and the residual block. We refer to this as NN-4. 
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Table 1: Hypothetical data for some example students. The predicted outcome is the probability that the 
student would complete the assignment. Students in bold are those whose randomized treatment assignment 
is congruent with the recommendation of the counterfactual inference model. Data from these students would 
be used to calculate the policy risk. 


Predicted Predicted Tecanwene 
ID Group Completion outcome if outcome if Treat? 
effect 
treated not treated 
1 Control 1 0.8 0.75 0.05 1 
2 Control 0 0.3 0.45 -0.15 0 
3 Treatment 0 0.50 0.38 0.12 1 
4 Treament 1 0.91 0.99 -0.08 0 


5.3 Simulation based on real data - IHDP 
The Infant Health and Development Program (IHDP) dataset 


Table 2: Results of IHDP 


was a semi-simulated dataset introduced by [4]. The dataset Model erre care PEHE 
consists of a number of covariates from a real randomized NN-4 2.0 0.5 1.9 
experiment. The goal of the experiment is to study the im- BNN-2-2 1.7 0.3 1.6 


pact of superior child care and home visits on future cogni- 
tive test scores. [4] discarded a biased subset of the treated 
population in order to introduce imbalance between treated 
and control subjects and used a simulated counterfactual 
outcome. Eventually, there are 747 subjects (139 treated, 
608 control), each represented by 25 covariates assessing the 
attributes of the children and their mothers. 


5.4 ASS ISTments dataset 


The ASSISTments online learning platform [3] is a free web- 
based platform utilized by a large user-base of teachers and 
students. The platform has been the subject of a recent 
study within the state of Maine [9], demonstrating signif- 
icant learning gains for students using the platform. The 
dataset used in this work comes from one of 22 random- 
ized controlled experiments [12] collected within the plat- 
form. This experiment was run in assignment types known 
as ’skill builders” in which students are given problems until 
a threshold of understanding is reached; within ASSIST- 
ments, this threshold is traditionally three consecutive cor- 
rect responses. Reaching this threshold denotes sufficient 
performance and completion of the assignment. In addi- 
tion to this experimental data, information of the students 
prior to condition assignment is also provided in the form of 
problem-level log data providing a breadth of student infor- 
mation at fine levels of granularity. 


In this experiment, there are two kinds of hints (video versus 
text) available for each problem from the assignment when 
students answer the problem incorrectly. The assignment 
to the video hint and the text video was random. Video 
content was designed to mirror text hint in an attempt to 
provide identical assistance. There are 147 students who 
received the video hint and 237 students who received the 
text hint. The dataset includes 15 covariates such as stu- 
dent past-performance history, class-past performance his- 
tory. We solve a binary classification task which is to predict 
the completion of the assignment for each student. 


6. RESULTS 

The results of IHDP is presented in Table 2 when the treat- 
ment threshold A = 0. We see that our proposed RCN per- 
forms the best on the dataset in terms of estimating ITE, 
ATE and PEHE. There is an especially large improvement 
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CFR 1.4 0.2 1.6 
RCN 1.1 0.05 1.4 


on estimating ITE. These results indicate that the residual 
block Af (x) helps accurately predict the value of ITE based 
on the feature representation ®(«) for a given participant x. 


The results of ASSISTments dataset are the interest of our 
work since we hope to apply the RCN to educational ex- 
periments in order to support decision making in terms of 
personalized learning. The results in terms of policy risk and 
he average treatment effect on the treated are shown in Ta- 
ble 3 when the treatment threshold \ = 0. The model TA 
means Treated All” where all students are assigned to the 
reatment while the model NT means ”Not Treated” where 
all students are assigned to the control. Without considering 
hat the effects of an intervention may differ for individual 
students, the model with the better performance out of these 
wo models would be adopted when a choice must be made 
between these two interventions. The RCN, which consid- 
ers the individual treatment effect, outperforms the TA and 
he NT. This indicates that taking the individual effect into 
account helps make a better choice of interventions. The 
comparison between the CFR and the RCN suggests that 
he RCN performs better than the CFR does in terms of 
risk policy and ATT. 


To investigate the correlation between policy risk and treat- 
ment threshold \, we plot the value of policy risk as a func- 
tion of treatment threshold \ in Figure 3. For the results 
of the ASSISTments dataset from the CFR, the maximum 
predicted ITE in the dataset is 0.44. Once the threshold A 
is larger than 0.44, the CFR is converted to "Not Treated” 
where all students are assigned to the control. Since the 
maximum predicted ITE in the ASSISTments dataset from 
the CFR is 0.18, the CFR is converted to Not Treated” once 
the treatment threshold X is larger than 0.18. 


7. CONCLUSION 


As online educational experiments become popular and easy 
to conduct, and machine learning becomes a major tool for 
researchers, counterfactual inference gains a lot of interest 
for the purpose of personalized learning. In this paper we 
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policy risk 


3.0 0.2 0.4 0.6 0.8 1.0 


treatment threshold \ 


Figure 3: Treatment threshold versus policy risk on 
ASSISTments dataset. The lower policy risk is the 
better. 


Table 3: Results of the ASSISTments Dataset 


Model Rpou €ATT 
TA 0.14 - 
NT 0.27 - 

CFR 0.14 0.08 
RCN 0.08 0.03 


propose the Residual Counterfactual Networks (RCN) to es- 
timate the individual treatment effect. Because of the dis- 
similarity between the distributions of the control and the 
treated populations, the RCN uses IPMs, such as Wasser- 
stein and MMD distance, to learn balancing deep features 
from the data. A residual block is adopted on the deep fea- 
tures to learn the individual treatment effect (ITE) so that 
estimation of the ITE is dependent on the deep features. We 
apply our model to both synthetic datasets and real-world 
datasets from online educational experiment, indicating that 
our model achieves the state-of-the-art. 


One open question for the future work is how to generalize 
our model for the situations where there is more than one 
treatment in the experiment. Integral Probability Metric 
(IPM) can only measure the distance between two distribu- 
tions. We could use pair-wised IPM if there are more than 
two distributions. But this would be computationally time- 
consuming if the number of distributions increases. Since 
running experiments is expensive and collecting enough data 
for the model to make a reliable prediction is difficult, we 
need a better optimization algorithm which allows us to 
train the model efficiently. 
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